STATISTICAL ANALYSIS OF THE HIRSCH INDEX 



By Luca Pratelli*, Alberto Baccini t , Lucio Barabesi* 
and marzia marcheselll" 1 " 

Accademia Navale di Livorno* and Universitd di Siena} 

The Hirsch index (commonly referred to as /i-index) is a biblio- 
metric indicator which is widely recognized as effective for measuring 
the scientific production of a scholar since it summarizes size and 
impact of the research output. In a formal setting, the /i-index is 
actually an empirical functional of the distribution of the citation 
counts received by the scholar. 

Under this approach, the asymptotic theory for the empirical h- 
index has been recently exploited when the citation counts follow a 
continuous distribution and, in particular, variance estimation has 
been considered for the Pareto-type and the Weibull-type distribu- 
tion families. However, in bibliometric applications, citation counts 
display a distribution supported by the integers. Thus, we provide 
general properties for the empirical /i-index under the small- and 
large-sample settings. 

In addition, we also introduce consistent nonparametric variance 
estimation, which allows for the implemention of large-sample set 
estimation for the theoretical /i-index. 



1. Introduction. The /i-index has been introduced by Hirsch (2005) 
as a research performance indicator for individual scholars. The h- index is 
designed as a single score, balancing the two most important dimensions of 
research activity, i.e. the productivity of a scholar and the corresponding 
impact on the scientific community. Indeed, according to the original defi- 
nition of the empirical /i-index provided by Hirsch (2005), "a scientist has 
index h, if h of his or her N p papers have at least h citations each, whereas 
the other (N p — h) papers have no more than h citations each" . 

Notwithstanding that the /i-index has been only recently proposed, it is 
increasingly being adopted for evaluation and comparison purposes to pro- 
vide information for funding and tenure decisions, since it is considered an 
appropriate tool for identifying "good" scientists (Ball, 2007). As a mat- 
ter of fact, several reasons explain its popularity and diffusion (Costas and 
Bordons, 2007). As it is apparent from its definition, the /i-index displays 
a simple structure allowing easy computation, using data from Web of Sci- 
ence, Scopus or Google Scholar, while it is robust to publications with a 
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large or small number of citations. In addition, the /i-index may be adopted 
for assessing the research performance of more complex structures, such as 
journals (setting up as a competitor of the Impact Factor, see e.g. Braun 
et ah, 2006), groups of scholars, departments and institutions (Molinari and 
Molinari, 2008) and even countries (Nejati and Hosseini Jenab, 2010). 

Quite obviously, the /i-index has received considerable attention by re- 
searchers in the fields of scientometrics and information science (Van Noor- 
den, 2010). Even if the Hirsch index was originally introduced in a descrip- 
tive framework, scientometricians often aim to assume a statistical model 
for citation distribution and the interest is focused on the empirical /i-index 
(see e.g. Glanzel, 2006). In a proper statistical perspective, Beirlant and 
Einmahl (2010) have managed the empirical /i-index as the estimator for 
a suitable statistical functional of the citation-count distribution. Accord- 
ingly, these authors have proven the consistency of the empirical /i-index 
and they have given the conditions for its large-sample normality. In ad- 
dition, Beirlant and Einmahl (2010) have provided an expression for the 
large-sample variance of the empirical /i-index and a simplified formula for 
the same quantity when the underlying citation-count distribution displays 
Pareto-type or Weibull-type tails. These two special families have central 
importance in bibliometrics, since heavy-tailed citation-count distributions 
are commonly assumed (see e.g. Glanzel, 2006, and Barcza and Teles, 2009). 

Beirlant and Einmahl (2010) have developed the asymptotic theory for 
the empirical /i-index by assuming a continuous citation-count distribution, 
even if the citation number is obviously an integer. Hence, scientometricians 
may demand results on the empirical Hirsch index under a more general 
approach. Thus, on the basis of a suitable reformulation of the empirical h- 
index, we provide distributional properties, as well as exact expressions for 
the mean and variance, of the empirical /i-index when citation counts follow 
a distribution supported by the integers. Moreover, the general large-sample 
properties of the empirical /i-index are obtained and the link between the 
"integer" and the "continuous" cases is fully analyzed. In addition, a sim- 
ple and consistent nonparametric estimator for the variance of the empiri- 
cal /i-index is also introduced under very mild conditions. Accordingly, the 
achieved theoretical results are assessed in a small-sample study by assum- 
ing some specific heavy-tailed distributions for the citation counts. Finally, 
an application to the "top-ten" researchers for the Web of Science archive in 
the field of Statistics and Probability during the period 1985-2010 is carried 
out. 
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2. Definitions and preliminary results. Let X be a positive random 
variable (r.v.) and let S be the corresponding survival function (s.f.), i.e. 

S{x) = P(X > x). 

Even if X is a discrete r.v. in the common bibliometric applications (since 
it represents the citation number for a paper of a given scholar), we actually 
provide a more general approach. Similarly to Beirlant and Einmahl (2010), 
it is assumed that S{x) > for each x since an unbounded support for the 
r.v. X is usually required in scientometrics (Egghe, 2005). If the left-hand 
limit of S is denoted by 

S-(x) = P{X > x), 

on the basis of the Hirsch definition of the empirical /i-index reported in 
the Introduction, for each integer n > 1, a "natural" expression for the 
theoretical Hirsch index h n , corresponding to the law of X, is given by 

(2.1) hn = sup{x > : nS-(x) > x}. 

Obviously, it turns out that h n > since S is a strictly positive function. It 
is at once apparent that (2.1) encompasses the definition of the theoretical 
h-mdex given by Beirlant and Einmahl (2010) when the r.v. X is assumed 
to be continuous. Moreover, when the r.v. X is integer-valued - the most 
interesting situation in bibliometrics - the theoretical Hirsch index (2.1) 
reduces to the integer number defined by 

(2.2) hn = max{j G IN : nS(j - 1) > j} 

n 

= J2 I [j/n,l]( S U ~ 1)), 

where Ie represents the usual indicator function of a set E. It should be 
remarked that h n oo and h n /n — > as n — > oo, as immediately follows 
from the definition (2.1). 
Since it holds that 

S^j) = P([X\ >i),jGN, 



where [ - J denotes the function giving the greatest integer less than or equal 
to the function argument, it is worth noticing that \h n \ turns out to be the 
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/i-index corresponding to the law of \_X\. Indeed, from the definition (2.1) 
we have 

s_(LM)>%^ 

and 

5_(i+LM)< 1 + LM 



n 

Rephrasing the previous statement in its dual setting, if X is an integer- 
valued r.v., the /i-index corresponding to the law of X turns out to be the 
integer part of the /i-index corresponding to the absolutely continuous law 
X + U, where U is a uniform r.v. on [0, 1] independent from X. 

If Xi, . . . , X n are n independent copies of X, the estimator of h n , i.e. the 
empirical /i-index, may be immediately introduced as an empirical functional 
on the basis of the definition (2.1). More precisely, the empirical fo-index is 
defined to be 



(2.3) H n = sup{x > : nS n -(x) > x}, 

where 

1 - 

S n -(x) = - V/r zoo r(Xi). 

n r— , 1 ' 1 

i=i 

It should be remarked that (2.3) reduces to the empirical h-'mdex defined by 
Beirlant and Einmahl (2010) when the r.v. X is continuous. Moreover, on the 
contrary to (2.3), the expression of the empirical /i-index commonly given 
in bibliometric literature (see e.g. Glanzel, 2006, p. 316) is not consistent 
when the realizations of the n copies are null. In addition, by considering 
the previous discussion and from expression (2.2), the estimator of \h n \ 
corresponding to the law of [-^J is given by 

n 

(2-4) fln = EW](^0'-l)), 

i=i 

where S n represents the empirical s.f., i.e. 

1 n 

S n (x) = - ^2l] X)00 [(Xi). 
n i=i 
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It is at once apparent that 

(2.5) H n =[H n \, 

while it turns out that H n = H n when the r.v. X is integer-valued. Actually, 
estimator (2.4) constitutes the formal expression of the empirical /i-index 
given by Hirsch and reported in the Introduction. 

It holds that H n oo a.s. (and hence H n /*■ oo a.s.) as n — > oo on 
the basis of the Glivenko-Cantelli Theorem. In particular, it follows that 
E[H n ] /• oo and E[H n ] /* oo as n — > oo. 

In order to achieve some useful small-sample properties for the estimator 
(2.4), it should be remarked that 

Y 3,n = I\j/n,l]( S n{j ~ !)) , j = 1 ,-- - , n , 

are dependent Bernoulli random variables. More precisely, each Yj n turns 
out to be a Bernoulli r.v. with parameter 

p 3>n = E[Yj >n ] = P(nS n (j - 1) > j) 

= £□ s(j-ini-s(j-i)) n -y. 

y=j \ y / 

In the sequel, we pose p~ n = if j > n. Moreover, since it trivially holds 
that 

Vax[Y jin ] =Pj, n (l ~Pj,n), 

and 

Cov[Yj, n , Yi :Tl ] = p j>n (l - pi tn } 
for j > I, it also follows that 

n 

E[H n ]=Y J P h n 
3=1 

and 

n n I— 1 

(2.6) Vsx[H n ] = " Pj,n) + 2 £> >n E(! - Pj>) 

j=l Z=2 3=1 

n 

J'=l 

where 

n 

^Ji rl — ^*i> n ^ ^ ' Pl,n- 



G 



L. PRATELLI ET AL. 



Obviously, it holds that E[H n ]/E[H n ] — > 1 as n — > oo. The behavior of 
Var[H n ] and Var[H n ] as n — > oo will be considered at length in Sections 3 
and 4. 

3. Large-sample properties of the empirical /i-index. By means 
of expression (2.5) and considering the discussion following expression (2.2), 
in order to explore the large-sample behavior of the empirical fo-index as 
n — > oo, laws defined on a continuous support may be managed by consid- 
ering laws concentrated on integers and vice versa. Moreover, if (a n ) n is an 
infinitesimal sequence and a n (H n — h n ) converges in distribution to /i, it 
follows that 

a n (H n - h n ) fi <S=^ a n (H n - \ h n \ ) fi. 

In addition, by noting that a n ~ b n means asymptotic equivalence of the 
sequences (a n ) n and (b n ) n , i.e. lim n a n /b n = 1 as n — > oo, if a~ 2 ~ Yar[H n ] 
and lim n Var[iif n ] = oo, we have 



Var[ff„] 



and 



/ p T 7 \ d H n — h n d Hn — h n d 

a n {H n - hn) — > n <^=>- — > n <S=^ — > fi, 

& n &n 

where a\ is a consistent estimator of Vav[H n ], i.e. 



Yar[H n ] 

Hence, in order to implement confidence sets for h n , the evaluation and the 
estimation of Yar[H n ] is of central importance in the most interesting case for 
the scientometricians, i.e. when it holds that Var[i7 n ] — > oo as n — > oo. For 
example, this setting happens for the Pareto-type family of laws satisfying 
the condition 

S(x) = x~ a l{x) 

with a e]0, oof and for the Weibull-type family of laws satisfying the condi- 
tion 

S{x) = exp(—x T l(x)) 
with r e]0, l/2[, where I is a slowly- varying function, i.e. 

Ktx) 
i(t) 
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for each x as t — > oo. Since the variance (2.6) is a function of the probabilities 
Pj t n,s, the preliminary step consists in determining tight inequalities for these 
quantities, as the following result provides. 

Theorem 3.1. If G represents the s.f. of the standard Normal distribu- 
tion, there exists a constant A > such that, for each n > 1 and j = 1, . . . , n, 
it holds that 



(3.1) 

where 

and 



v^ n + 1 

\Pj,n - G{xj,n)\ < A ^ J, n 



yi+K>i) 6 ' 



X 



3 ~ nS(j - 1) 



j. a 



vl n = nS{j-l){l-S{j-l)). 



Corollary 3.1. There exists a constant C > solely depending on A, 
such that 

C 



(3.2) 



for each n and h n > 1. 



J2 r J> ^ 



j = |2/i n J+l 



/l 3/2 
1 1 n 



The further Corollary to Theorem 3.1 gives the consistency of the esti- 
mator H n . 

Corollary 3.2. The ratio H n /h n converges in quadratic mean to 1, i.e. 
it holds that 

Hn 

v K 

as n — )■ oo . 



1 







Similarly to the framework considered by Beirlant and Einmahl (2010), 
the previous consistency result is stated in a ratio-setting since li„ / oo as 
n — > oo. Finally, on the basis of Corollary 3.2 it also follows that 



E\H n 



1 



as n — > oo. 
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4. Consistent estimation of the empirical /i-index variance. As 

emphasized in Section 2, in order to achieve the convergence in distribution 
of H n , the evaluation of the variance (2.6) is central. To this aim, the fol- 
lowing result provides some inequalities and a useful asymptotic equivalence 
for (2.6) by assuming a mild condition. 

Theorem 4.1. For each n it holds that 

[2h n j [2h n j 

(4.1) Var[H n ]> ^ r j>n (l - p j)Tl ) > ^ ¥j, n (l - pj >n ), 

j=l 3=1 

where 

[2h n \ 

fj,n — Pj,n 2 ^ ' Pl,n- 
1=3+1 

Moreover, if 

(4.2) liminfVar[# n ] > 0, 
it holds that 

\2h n \ 

(4.3) Var[# n ] ~ ]T r hn {l-p j>n ) 

i=i 

as n —7- oo. In particular, if 

[3H„] [3H n ] l-l 

V n = PjA 1 ~ Hn) + 2 H Pl,n ~ Pj,n) 

j=l 1=2 3=1 

it holds that 

Rn = — ^ A 1 

Var[tf n ] 

as n —¥ oo . 



From Teorem 4.1, it is at once apparent that V n would be a consistent 
estimator of (2.6) when it is possible to evaluate the p^ n s for j < [3H n \ and 
in the case that condition (4.2) holds, i.e. when Var[i/ n ] does not approach 
as n — )■ oo. Hence, it is convenient to introduce a further condition which 
solely involves the behavior of S on IN and which implies condition (4.2). 
More precisely, we consider the condition 
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\fnib(n) 

4.4 lim V J\ } = 0, 

where 

ip(n) = S(n - 1) - S(n) = P(n - 1< X < n). 

Obviously, when the r.v. X is integer- valued, ip represents the probability 
function corresponding to X. It should be noticed that condition (4.4) may 
also be reformulated as 

S(n-1) _ 7n 
S(n) sfn 

where (7n)n is a positive infinitesimal sequence, and hence for each M > 0, 
it holds 

(4-5) ^(n-MVn) 



since 



< 



exp 



2(M + l)VnA 



n — My/n — 1 



where S n = snp h>n ^h- As proven in the following result, condition (4.4) 
ensures the unboundedness of (2.6) as n — > oo. 

Theorem 4.2. If the law of X satisfies condition (4-4); it holds that 

l[mV&r[H n ] = oo, 

In order to achieve consistent estimation of (2.6), it is necessary to in- 
troduce a further condition, which is slightly more restrictive than condition 
(4.4). More precisely, this condition assumes that for each M > it holds 
that 



(4.6) lim sup 



m 1 



0. 
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where Dm,u = [n — My/n, n + My/n\ n N. It may be easily verified that 
condition (4.6) implies condition (4.4) and hence condition (4.5). 
Since a natural estimator for p,- n is given by 



hn = E ("I Sn(j " l) y (l " S n (j - I))""", 

y=j V^/ 

on the basis of the large-sample behavior of the ratio R n given in Theorem 
4.1, an estimator for the variance (2.6) turns out to be 

min( [3H n \ ,n) min( [3H n \ ,n) l-l 

(4.7) V n = E Pi,n(l-ft>)+2 E Pl,nX)(l-Pj>). 

j=l «=2 j=l 

It should be remarked that estimator (4.7) is fully nonparametric. Indeed, it 
does not require the specification of a semi-parametric model for the under- 
lying citation distribution as in the case of the variance estimator proposed 
by Beirlant and Einmahl (2010). For example, their estimator requires the 
estimation of the Paretian index when a Pareto-type family is assumed for 
the law of X - a non-trivial task, see e.g. Beirlant et al. (2004). 

The following result provides a compact asymptotic equivalent expression 
for (2.6) and the consistency of estimator (4.7) if condition (4.6) is assumed. 

Theorem 4.3. If the law of X satisfies condition (4-6), it holds that 

h n 



(4.8) Var[iT, 



;i + mKLM)) 2 

as n — ^ oo. Moreover, it follows that 

Vn P 



Var[# n ] 

as n — )• oo. 



1 



It should be remarked that for the Pareto-type and the Weibull-type 
families (described in Section 3) condition (4.6) is satisfied. Accordingly, H n 
approaches normality and from Theorem 4.3 for the Pareto-type family, it 
holds that 

and 
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when l(x) ~ C 1+a , while for the Weibull-type family, it holds that 

Vax[H n ] ~ — kn ( , .. 2 

(1 + T\og{n/h n )Y 

and 

K ~ CCtogn) 1 ^ , 

when ~ C _T and where C > is a suitable constant. These results are 
in complete agreement with the findings by Beirlant and Einmahl (2010). 

On the basis of Theorem 4.3 and on the remarks contained in Section 
2, when condition (4.6) is satisfied by the underlying distribution, a large- 
sample confidence set for h n at the (1 — 7) confidence level turns out to 
be 

(4.9) C n = {{H n - 0!_ 7/2 VK], ...,[#« + *i- 7 /2\/^J} S 

where z 7 represents the 7-th quantile of the standard Normal distribution, 
while [•] represents the function giving the integer closest to the argument. 
In addition, in order to assess the homogeneity of the theoretical /i-indexes 
for two scholars, a suitable test statistic is given by 

T Hi ;n — H2 t n 

n — I ^ ~ ) 

\M,n + ^2,n 

where H\ tn and -£?2,n represent the empirical h-indexes corresponding to the 
scholars, while Vi >n and V2,n are the respective variance estimators as given 
by (4.7). It is at once apparent that 

T n A AT(0, 1) 

as n — > 00, when H\ yn and H2, n approaches normality. The test statistic T n 
is defined in a nonparametric setting, in contrast to the test statistic pro- 
posed in a semiparametric approach by Beirlant and Einmahl (2010), which 
requires consistent estimation of the two Paretian indexes of the scholar 
citation distributions. 

Finally, when the analysis of the theoretical /i-indexes corresponding to k 
scholars (k > 2) is considered, simultaneous set estimation and homogene- 
ity hypothesis testing could be managed by means of techniques similar to 
those suggested in Marcheselli (2003). These issues will be pursued in future 
research. 
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5. Some studies and examples. In order to assess in practice the 
properties of the empirical /i-index achieved in the previous sections, a study 
was carried out for two heavy-tailed distributions. First, a discrete stable 
distribution for the r.v. X was considered. This distribution may be specified 
via the probability generating function 

g{s) = E[s x ] = exp(-A(l - s) a ) , s G [0, 1], 

where a£]0,l] and Ae]0,oo[ (Steutel and van Harn, 2004, p. 265). The 
discrete stable distribution is Paretian of order a for a£]0, 1[ (Christoph 
and Schreiber, 1998) and it constitutes a flexible and natural model for 
heavy-tailed discrete data (see Marcheselli et a/., 2008, for a description of 
the distribution properties and of the corresponding parameter estimation 
issues). A "discretized" Weibull distribution was subsequently assumed for 
the r.v. X. The distribution displays the probability function 

f(x) = [exp(-x T ) - exp(-(x + 1) t )]/]n(x), 

where r e]0, oo[. Obviously, it turns out that 

S(j) = exp(-(i + 1Y), jGN. 

By assuming that n = 30, 50, 100, 150, 200, the values of h n , E[H n ], V&r[H n ] 
and of the large-sample variance approximation (4.8) were computed for 
the discrete stable distribution with parameter vectors (a, A) = (0.25,1.0), 
(0.50, 1.5), (0.75, 2.0), as well as for the discretized Weibull distribution with 
parameters r = 0.01,0.10,0.40. These choices were made in order to fit, as 
close as possible, the real productivity and the real citation distributions 
of scholars with different scientific ages and belonging to different research 
areas and with more or less pronounced impact on research. 

In the study, B = 10, 000 random variates were generated for each n 
choice and for each considered distribution in order to achieve the Monte 
Carlo estimation of -EfVn] and the Monte Carlo estimation of the actual 
coverage for the confidence set (4.9) at the 95% nominal confidence level. 
The corresponding results were reported in Tables I and II. 

The analysis of these tables shows that h n and E[H n ] are similar even for 
small n values and V n turns out to be nearly unbiased. In addition, it can 
be verified that the actual coverage of the large-sample confidence set (4.9) 
is almost equivalent to the nominal coverage even for small n values. Unfor- 
tunately, it can be assessed that the large-sample variance approximation 
(4.8) may be quite dissimilar from Var[i7 n ] even for quite large n values. It 
should be remarked that an estimation procedure based on (4.8) requires, 
in any case, the additional estimation of a or r. 
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Accordingly, V n seems to be an appealing variance estimator, both from 
a theoretical and practical perspective. In general, we have verified similar 
conclusions for a plethora of distributions satisfying condition (4.6), even if 
we have not reported the corresponding results. 



Table I. Discrete stable distribution. 



a 


A 


n 


K 


E[H n ] 


Vax[H n ] 




E[V n ] 


Coverage 


0.25 


1.0 


30 


n 


11.31 


4.73 


7.04 


4.88 


0.96 






50 


17 


17.17 


7.58 


10.88 


7.79 


0.96 






100 


30 


30.28 


14.18 


19.20 


14.51 


0.96 






150 


42 


42.19 


20.34 


26.88 


20.73 


0.95 






200 


53 


53.38 


26.21 


33.92 


26.74 


0.95 


0.50 


1.5 


30 


9 


8.96 


2.66 


4.00 


2.96 


0.95 






50 


12 


12.46 


4.01 


5.33 


4.35 


0.96 






100 


19 


19.59 


6.83 


8.44 


7.25 


0.96 






150 


25 


25.57 


9.25 


11.11 


9.72 


0.96 






200 


31 


30.91 


11.43 


13.78 


11.98 


0.95 


0.75 


2.0 


30 


6 


6.65 


1.11 


1.96 


1.38 


0.97 






50 


8 


8.38 


1.58 


2.61 


1.90 


0.96 






100 


11 


11.67 


2.57 


3.59 


2.93 


0.96 






150 


14 


14.29 


3.38 


4.57 


3.76 


0.95 






200 


16 


16.54 


4.09 


5.22 


4.54 


0.95 
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Table II. Discretized Weibull distribution. 



J- 


71 


/? 

' ' It 


E\H„] 




h„ 

(l+rlog(n/M) 2 




OoVP"Trl0'P 


n m 


OU 


1U 


1 n 77 


P. 77 


Q 78 
y . 1 o 


0.00 


O Q/l 

u.y4 




50 


17 


17.86 


11.25 


16.64 


11.05 


0.96 




100 


35 


35.47 


22.43 


34.28 


22.25 


0.96 






KO 
Oz 


^9 08 


^7 


^O 09 

ou.yz 


oo.ot) 


o of; 
u.yo 




zuu 


70 
1 U 


/ U.44 


/I /I 70 


f!8 

Oo.OO 


AA ^9 


0^ 

u.yo 


n in 
U.1U 


ou 


o 
o 


O.Oo 


/I Q"? 


P. 9/1 


A QO 

4.yu 


n o^ 
u.yo 




50 


13 


13.60 


7.83 


10.10 


7.82 


0.95 




100 


25 


25.09 


14.59 


19.28 


14.67 


0.95 




150 


35 


35.83 


20.95 


26.67 


21.12 


0.95 




200 


46 


46.07 


27.05 


34.97 


27.20 


0.95 


0.40 


30 


4 


4.47 


1.40 


1.23 


1.49 


0.97 




50 


6 


6.01 


1.72 


1.76 


1.85 


0.96 




100 


8 


8.74 


2.22 


1.98 


2.38 


0.96 




150 


11 


10.75 


2.54 


2.63 


2.71 


0.96 




200 


12 


12.38 


2.77 


2.66 


2.96 


0.96 



As a practical application of the achieved results, we also considered 
the scientific performance of the best ten scholars in the field of Statis- 
tics and Probability according to the Web of Science archive. Data were 
drawn from the Thomson-Reuters databases by selecting the scholars listed 
in the category Mathematics of the ISIHighlyCited.com database (given at 
the WEB site http://hcr3.isiknowledge.com/home.cgi). For each scholar in 
the database, an author search was performed during the month of Decem- 
ber 2010 on the ISI Web of Science for the period 1985-2010. The search was 
carried out in such a way that only articles and letters published in journals 
contained in the Statistics and Probability database were considered. Ac- 
cordingly, the citation counts were collected for each scholar. The citation 
counts covered documents contained in the Science Citation Index Expanded 
and Social Sciences Citation Index and Arts & Humanities Citation Index. 
Finally, the ten scholars with the highest /i-indexes were considered. More 
precisely, the names of the ten scholars, the corresponding paper number 
and /i-index, as well as the large-sample confidence sets at the 95% nomi- 
nal confidence level were reported in Table III. Obviously, pratictioners may 
largely benifit from this example in order to understand the importance of 
quantifying variability for an appropriate comparison analysis of the research 
performance. 
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Table III. Performance of the "top-ten" most-cited scholars 
in the field of Statistics and Probability during the period 1985-2010. 
n h n C n 



Hall, P.G. 


418 


46 


{42,. 


.,50} 


Rubin, D.B. 


104 


39 


{32,. 


•,46} 


Carroll, R.J. 


198 


38 


{33,. 


•,43} 


Tibshirani, R. 


104 


37 


{31,. 


.,43} 


Fan, J. 


114 


36 


{30,. 


.,42} 


Marron, J.S. 


107 


36 


{31,. 


-,41} 


Hastie, T.J. 


77 


34 


{27,. 


-,41} 


Lin, D.Y. 


93 


32 


{26,. 


.,38} 


Raftery, A.E. 


88 


31 


{25,. 


.,37} 


Wei, L.J. 


88 


31 


{26,. 


-,36} 
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APPENDIX A 

A.l. Proof of Theorem 3.1. For fixed j and n, let us assume that 

Iy-i^Xi) - Sjj - 1) . 
Z; = —r= = , 1 = 1, ...,n. 

V5 (j-l)(l-5(i-l))' 



Accordingly, we have 

1 



Pj,n = P 



Thus, by applying the Osipov inequality (see e.g. DasGupta, 2008, p. 659) to 
the ZiS for a = 6, inequality (3.1) is proven, since for each m > 2 it holds 
that 

E[lZr] ~ [5(j-l)(l-fif(j-l))](»-i)/2- ° 

A. 2. Proof of Corollary 3.1. Since for each n and j > 2/i n + 1, it 
holds that 

(A , } ^_ i(l _^), j(l .«), f 

and since from the definition of h n it follows that 

(A-2) v\ n < nS(j — 1) < j 

for j > 2h n + 1, by means of inequality (3.1) we have 

(A.3) |p i)n - G(s i)n )| < ^ 7 6 6 7 ' < 64A < — — . 

U j,n- L j,n J J 

Since on the basis of (A.l) and (A. 2) it also holds that 



2 ' 

and hence G(xj^ n ) < G(y/j/2), for each I > 2h n it follows from (A.3) 



1=3+1 J 1=3+1 J 

where B is a constant, solely depending on A. Thus, it also turns out that 

A B AB 

1^ r J> - 7572 + 773/2 

j=[2h n \+l n n olln 

and hence inequality (3.2) follows. □ 
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A. 3. Proof of Corollary 3.2. If S is a continuous function, from 
Corollary 1 given by Beirlant and Einmahl (2010) it holds that 

K 

as n — > oo. When S is not a continuous function, let h' n be the theoretical 
/i-index corresponding to the law of [^J + U, where U is a uniform r.v. on 
[0, 1] independent from X, while let be the empirical /i-index based on 
n independent copies of [-^J + U. 

Since \h n — h' n \ < 1 and \H n — iJ n '| < 1, the convergence in probability to 
1 of H n /h n is in turn obtained from the continuous-setting result. Moreover, 
the uniform integrability of H^/h 2 n follows by considering inequality (3.2) 
since 



E 



E y 3 > 

\j=[2h n ]+l 



Var 



E 

j=\2h n \+l 



+ E 



E 

j=[2h n j+l 
2 



< E r ^+\ E 



and 



j=[2h n \+l 



[2h n \ 



K j=[2h n \+l 



□ 



A. 4. Proof of Theorem 4.1. The inequalities in (4.1) easily follows 
from expression (2.6), while (4.3) follows from (3.2) and 

|2M n 
5 \fj,n Tj,n\ ^ 2/l n S ' ^/,7i' 
j=l J=|2/i„J+l 

Moreover, on the basis of Corollary 3.2 it turns out that H n /n converges in 
mean to 0. Hence, it is convenient to consider 

R 'n = -f[0,n]( 3 #n)-Rn, 

for which it holds that R' n < 1 by means of (2.6). Hence, the second part of 
the Theorem follows from (4.1), (4.3) and 



limP(L3F n J < [2h n \) = 0. 



□ 
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A. 5. Proof of Theorem 4.2. By using the notations introduced in 
Theorem 4.1, we have 



1 + nip(j) V j:n - Vj+l,n 



v j+l,n v j+l,n 

Thus, for a given c > and for each n such that h n > 1, on the basis of 
(4.5) it holds that 

1 + nip{j) J_ 

for j E [h n — s/h^i, h n + \/h^\ H INT. Equivalently, there exist at least c values 
Xj >n in the interval [—1,1]. 

Thus, if D n represents the set of indexes j £ [h n — y/h^, h n + y/E^\ fl IN 
for which \xj tTl \ < 1, on the basis of (3.1) and (4.1) it follows that 

\2h n \ 

V&l[H n ] > PjA 1 ~Pj,n) > PjA 1 -Pj,n) 

> cG(l)(l - G(l)) - A £ '> ' ' 

j£D n V jA L + \ X 0M) 

From (4.5) we have 



inf Vj ;7l ~ \JnS(h n - \/hn) ~ JnS(h n ) ~ \fh^ , 
and, since c is arbitrary, it holds that 



Urn inf Var[HJ > cG(l)(l - G(l)) - 441ml sup — 

n n mfj e D n 



= cG(l)(l-G(l))-4A 
which completes the proof. 



□ 



A. 6. Proof of Theorem 4.3. For a fixed M > 0, from condition (4.6) 
and from (4.5) it follows that 



(A.4) 



lim sup 



(1 + ni/j(j))v hj} 



(I + m/;([h n \))v 



'j+l,n 



0. 
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Thus, by means of expression (A. 4), from Theorem 4.2 it follows that 

[2/lnJ [2h n \ l-l 

Var[H n ] ~ Var[F n ] ~ ^ Pj,nO- ~ Vj,n) + 2 Pl,n ~ Pj,n) 

3=1 1=2 j=l 

[2h n \ l-l 

~ 2 H Pl,n ^Z{ l -Pj,n) 
1=2 j=l 

2h f 2hn f x 

~(l + n^jW-oo G(a)d*y_Jl-G(«))d« 



(l + m/>(LM)) 2 



since 



/OO fX 1 

G(x)dx / (1 -G(n))dn= -. 



Hence, expression (4.8) is proven. As to the consistency of V n , by assuming 
that 



it holds that 



sup 

j'6% ,hr, 



i>n{j) = s n (j - 1) - S n {j), 

(1 +n^ n (j)H n ,n n 



(l + n^(LVI))v. 



'j+l,n 



Ao, 



as n — )• oo, since 



inf limsupP(|i? n - h n \ > My\) = 0. 
Thus, convergence in probability of V n /Var[H n ] to 1 follows. □ 
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